Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Online task scheduling algorithm for big data analytics based on cumulative running work
LI Yefei, XU Chao, XU Daoqiang, ZOU Yunfeng, ZHANG Xiaoda, QIAN Zhuzhong
Journal of Computer Applications    2019, 39 (8): 2431-2437.   DOI: 10.11772/j.issn.1001-9081.2019010073
Abstract390)      PDF (1056KB)(248)       Save
A Cumulative Running Work (CRW) based task scheduler CRWScheduler was proposed to effectively process tasks without any prior knowledge for big data analytics platform like Hadoop and Spark. The running job was moved from a low-weight queue to a high-weight one based on CRW. When resources were allocated to a job, both the queue of the job and the instantaneous resource utilization of the job were considered, significantly improving the overall system performance without prior knowledge. The prototype of CRWScheduler was implemented based on Apache Hadoop YARN. Experimental results on 28-node benchmark testing cluster show that CRWScheduler reduces average Job Flow Time (JFT) by 21% and decreases JFT of 95th percentile by up to 35% compared with YARN fair scheduler. Further improvements can be obtained when CRWScheduler cooperates with task-level schedulers.
Reference | Related Articles | Metrics
Fast outlier detection algorithm based on local density
ZOU Yunfeng, ZHANG Xin, SONG Shiyuan, NI Weiwei
Journal of Computer Applications    2017, 37 (10): 2932-2937.   DOI: 10.11772/j.issn.1001-9081.2017.10.2932
Abstract502)      PDF (914KB)(447)       Save
Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.
Reference | Related Articles | Metrics